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(57) ABSTRACT 

An apparatus and method for profiling candidate reuse 
regions and candidate load instructions aids in the selection 
of computation reuse regions and computation reuse instruc- 
tions with good reuse qualities. Registers holding input 
values for candidate reuse regions are sampled periodically 
when the candidate reuse region is encountered. Tlie register 
contents are combined into set- values. Wlien a relatively 
small number of set- values account for a lai^ge percentage of 
occurrences, the candidate reuse region may be a good 
computation reuse region. Load instructions are profiled for 
the location accessed and the value loaded. The location and 
value are combined into location- values. The relative occur- 
rence frequency of location- values can be used to evaluate 
load instructions as candidate instructions for reuse. 
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number of criteria. One such candidate reuse region is 
shown as candidate reuse region 100 in FIG. 1. 

In action 720, die software program code is instrumented 
for profiling. Instrumenting for profiling includes inserting 
instructions in the program that profile top set-values and top 5 
location-values. In some embodiments, every time a candi- 
date reuse region is encountered, the instrumented code 
profiles a set-value for the candidate reuse region. In other 
embodiments, a sampling filter is employed, such as filter 
504 (FIG. 5A), and only one of every "S" set-values is lO 
profiled. 

In action 730, the instrumented code is executed and the 
profile data is gathered. As a result, profiling data structures, 
such as profiling data structure 400 (FIG. 4), and profiling 
data strucmre 650 (FIG. 6D) are generated. In action 740, 15 
the probability of occurrence of a top set-value is determined 
as the ratio of the number of times the top set-value was 
collected to the total number of times set- values were 
sampled. When a small number of top set-values represent 
a large percentage of tlie execution of the candidate reuse 20 
region, then the candidate reuse region will likely make for 
a good computation reuse region. 

In action 750, the candidate reuse region is used to form 
a computation reuse region if appropriate criteria are met. 
One such criteria is when the probability of occurrence of a 25 
small number of top set -values exceeds a dueshold. A 
candidate reuse region can be used by itself or can be 
combined with other candidate reuse regions to fonn a 
computation reuse region. 

FIG. 8 shows a processing system. Processing system 800 30 
mcludes processor 820 and memory 830. In some embodi- 
ments, processor 820 is a processor capable of executing 
instrumented software for profiling top set-values and top 
location-values. Processor 820 can also be a processor 
capable of selecting good computation reuse regions from 35 
candidate reuse regions. Processing system 800 can be a 
personal computer (PC), mainframe, handheld device, por- 
table computer, set-top box, or any other system that 
includes software. In some embodiments, the processor 
includes one or more predicate registers 840. 40 

In some embodiments, processor 820 includes cache 
memory, a memory controller, or a combination of the two. 
In these embodiments, processor 820 may access a profile 
indicator data structure without accessing memory 830. In 
other embodiments, profile indicators are maintained within 45 
memory 830, and processor 820 accesses memory 830 when 
updating profile indicators regardless of whether processor 
820 includes cache memory or memory controllers. 

Memory 830 can be a random access memory (RAM), 
read only memory (ROM), flash memory, hard disk, floppy 50 
disk, CDROM, or any other type of machine medium 
readable by processor 820. Memory 830 can store instruc- 
tions for performing the execution of the various method 
embodiments of the present invention. 

55 

CONCLUSION 

A software profiling mechanism that gathers and profiles 
top set-values and top location-values has been described. 
Software to be profiled is instrumented with instructions that 60 
sample set-values at the occurrence of candidate reuse 
regions and sample location-values at the occurrence of 
candidate load instructions. Set-values and location-values 
can be generated as concatenated values, or can be combined 
using mechanisms such as exclusive-or operators. When a 65 
small nimiber of top set -values account for a large percent- 
age of occurrences, the candidate reuse region may make a 



good computation reuse region. Likewise, when a small 
number of top location-values account for a large percentage 
of occurrences of candidate load instructions, the candidate 
load instruction may make a good candidate for inclusion in 
a computation reuse region. 

It is to be understood that the above description is 
intended to be illustrative, and not restrictive. Many other 
embodiments will be apparent to those of skill in the art 
upon reading and understanding the above description. The 
scope of the invention should, tlierefore, be detemiined with 
reference to the appended claims, along with the ftill scope 
of equivalents to which such claims are entitled. 

What is claimed is: 

1. A computer-implemented method comprising: 
identifying a candidate reuse region of a software pro- 
gram; 

determining an input set for the candidate reuse region, 
wherein the input set includes a plurality of input 
registers for storing input values of the candidate reuse 
region; 

instrumenting the software program to, when executed, 
sample set -values for the input set, wherein each set- 
value includes an input register value for each of the 
plurality of input registers; 

executing the instrumented software; 

tracking, during the execution, a number of times a 
set-value is encountered; and 

selecting, based on the tracking, the candidate reuse 
region as a computation reuse region. 

2. The computer-implemented method of claim lIKvherein 
the input-set comprises a plurality of input registers, and 
each set-value comprises an input register value for each of 
the plurality of input registers, and wherein the instrument- 
ing of the software program includes, 

. inserting combine instructions into the software program, 
tlie combine instructions which, when executed, will 
combine each of the input register values into a single 
value; and 

inserting index instructions into the software program, the 
index instructions which, when executed, will index 
into a data structure of profile indicators using the 
single value, 

3. The computer implemented method of claim 1, wherein 
the instrumenting of the software program includes inserting 
profile instructions to profile the top N occurring set-values, 
where N is based on a function of an expected number of 
reuse instances. 

4. A macliine readable medium including instructions, 
which when executed by a machine, cause the machine to 
perform operations according to the computer implemented 
method of claim 1. 

5. The machine readable medium of claim 4, wherein, 
during the execution, the sampling is performed every S 
occurrences of the set-values, and wherein S is an integer 
greater than 1 . 

6. The machine readable medium of claim 4 further 
including instructions, which when executed by a macliine, 
cause the machine to, for each set-value, combine each of 
the input register values into a single value. 

7. The computer implemented method of claim 1, wherein 
during the execution, the sampling is performed every S 
occurrences of the set-values, and wherein S is an integer 

The computer implemented method of claim 5^^Sli^^S!>-USPTO 
comprising, for each set- value, combining each of theiiiipaUv,1 ^^i^i.lSC^ISii^ 
register values into a single value. 
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9. The computer-implemented method of claim 8, 
wherein the combining of each of the input register values 
into a single value includes: 

folding each of the input register values to create folded 

values; and 
concatenating the folded values. 

10. A computer-implemented method comprising: 
determining whether a software program region is a 

computation reuse region, wherein the determining 
includes, 

periodically sampling a set of registers to obtain register 
values, wherein the register values are input values of 
tlie software program region; 

combining the register values into a single set-value; 

determining an occurrence frequency of the single set- 
value; and 

storing the occurrence frequency and the single set-value 
in a data structure; 

basing the determination of whetlier the software program 
region is the computation reuse region on the occur- 
rence frequency. 

11. Hie computer-implemented method of claim 10, 
wherein tlie periodically sampling of tlie set of registers 
includes sampling ones of the set of registers to obtain a 
set-value every S occurrences of the software program 
region, wherein S is a sampling period, wherein S is greater 
than 1, and wherein S is chosen so that a statistically vaHd 
number of registers are sampled. 

12. The computer-implemented method of claim 11 ftir- 
-ther comprising: 

identifying a group of control equivalent candidate region 
entries and candidate load instructions; 

inserting predicate instructions prior to ones of the group, 
wherein the predicate instructions set a predicate reg- 
ister every S occurrences; and 

inserting profiling instructions at each of the control 
equivalent candidate region entries and candidate load 
instructions, wherein the profiling instructions are 
predicated on the predicate register. 

13. The computer-implemented method of claim 11, 
wherein the storing includes, 

accessing a record in the data structure as a fiinction of the 
set-value; and 

incrementing a profile indicator associated with the 
record. 

14. The computer-implemented method of claim 11, 
wherein the periodically sampling of the set of registers 
further includes sampling, at the beginning of a candidate 
reuse region, set-values in ones of the set of registers, the 
plurality of registers being input registers to the candidate 
reuse region. 

15. A computer-implemented method comprising: 
identifying a candidate load instruction in a software 

program; 

instrumenting the software program to, when executed, 
sample a location-value every S occurrences of the 
candidate load instruction, wherein S is an integer 
greater than 1 ; 

storing an occurrence frequency of the location-value into 
a data structure; and 

executing the software program. 

16. The computer- implemented method of claim 15, 
wherein the instrumenting of the software program includes, 

inserting count instructions in the software program to 
count a ntimber of times the location- value is sampled; 
and 
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inserting track instructions in the software program to 
keep track of top location- values. 

17. The computer-implemented method of claim 160 
wherein the candidate region includes a plurality of candi- 

5 date load instructions, each of tlie plurality of load instruc- 
tions being predicated on a conmion predicate register. 

18. The computer-implemented method of claim 16, 
wherein the inserting of the track instructions to keep track 
of top location-values includes inserting sampling instnic- 

10 tions configured to profile the top N occurrences of location- 
values, where N is an integer. 

19. The computer-implemented method of claim 15 ftir- 
ther comprising: 

identifying a group of control equivalent candidate region 
15 entries and candidate load instructions in tlie software 
program; 

inserting predicate instructions in the software program 
prior to ones of the group, wherein the predicate 
instructions set a predicate register every S occur- 
20 rences; and 

inserting profiling instructions in the software program at 
each of the control equivalent candidate region entries 
and candidate load instructions, wherein the profiling 
instructions are predicated on the predicated register. 
25 20. A machine readable medium including instructions, 
which when executed by a machine, cause the machine to 
perform operations according to the computer implemented 
method of claim 15. 

21. The machine readable medium of claim 20, wherein 
30 the instrumenting of the software includes inserting count 

instructions in the software to count a number of times the 
location- value is encountered. 

22. The machine-readable medium of claim 20, wherem 
the instrumenting of the software includes inserting track 

35 instructions in the software program to keep track of top 
location-values. 

23. Tlie computer- implemented method of claim 15, 
wherein S is chosen so that a statistically valid number of 
location-values are sampled. 

40 24. A computer-implemented method comprising: 

selecting candidate reuse regions within a software pro- 
gram; and 

selecting reuse regions from the candidate reuse regions, 
the selecting of the reuse regions including, 
45 periodically sampling set -values for ones of the candidate 
reuse regions to produce a probability of occurrence of 
top set- values, wherein each of the set- values includes 
values of input registers for one of the candidate reuse 
regions; and 

50 basing the selection of the reuse regions on the prob- 
ability of occurrence of the top set-values. 

25. The computer -impjemen ted method of claim 24, 
wherein sampling the jset values ( includes, representing each 
set -value as a single value; and accessing a data sU*ucture as 

55 a fiinction of the single value to modify a profile indicator. 

26. The computer-implemented method of claim 25, 
wherein the data sUoicture is at least as large as a number of 
expected reuse instances. 

27. The computer-implemented method of claim 24, 
60 wherein selecting the reuse regions further includes marking 

as reuse regions those candidate reuse regions having a finite 
number of set- values that have a probability of occurrence 
greater than a threshold. 

28. A machine readable medium including instructions, 
65 which when executed by a machine, cause the ma^^^t^.^ 

perform operations according to the computer implemented- 
method of claim 24. r^-w.. 



